Support soft target in softmax_cross_entropy #5595

anaruse · 2018-10-29T06:52:21Z

This PR aims to support "soft target" in softmax_cross_entropy.

Current softmax_cross_entropy implementation support "hard target" but does not support "soft target" that is becoming popular as a method to mitigate over-fitting. This PR allows users to use "soft target" in softmax_cross_entropy as follows.

soft_target_loss = F.softmax_cross_entropy(x, t, soft_target=soft_target)

The soft target loss is KL divergence.

anaruse · 2018-10-29T08:38:53Z

Or, would it be better to implement this as different function such as softmax_kl_divergence?

anaruse · 2018-10-30T09:42:48Z

I've fixed the PR, so that it uses argument t for both hard and soft target. Whether it is hard or soft target is determined by ndim and shape of x and t.

beam2d

The design and implementation looks good. I added some minor comments.

beam2d · 2018-11-15T05:46:42Z

chainer/functions/loss/softmax_cross_entropy.py

-            t_type.ndim == x_type.ndim - 1,
+        if x_type.ndim == t_type.ndim and x_type.shape == t_type.shape:
+            # assume t is soft_target
+            self.soft_target = True


Keep check_type_forward not having side effect. This method may be skipped by setting CHAINER_TYPE_CHECK=0.

beam2d · 2018-11-15T05:50:16Z

chainer/functions/loss/softmax_cross_entropy.py

-            x_type.dtype.kind == 'f',
-            t_type.dtype.kind == 'i',
-            t_type.ndim == x_type.ndim - 1,
+        if x_type.ndim == t_type.ndim and x_type.shape == t_type.shape:


I feel it's better to branch based on dtype kind and then check ndim/shape with expect. It will produce an error message that matches the user's intent.

beam2d · 2018-11-15T05:53:11Z

chainer/functions/loss/softmax_cross_entropy.py

+    def _soft_target_loss(self, xp, x, t, log_y):
+        kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1)
+        if self.reduce == 'mean':
+            self._coeff = 1.0 / (numpy.prod(x.shape) / x.shape[1])


Suggested change

self._coeff = 1.0 / (numpy.prod(x.shape) / x.shape[1])

self._coeff = 1.0 / (x.size / x.shape[1])

size can be used to get the total number of elements.

beam2d · 2018-11-15T05:57:38Z

chainer/functions/loss/softmax_cross_entropy.py

+            return kl_d.reshape(()),
+        else:
+            shape = (x.shape[0],) + x.shape[2:]
+            return kl_d.reshape(shape),


Why is this reshape needed?

beam2d · 2018-11-15T05:59:54Z

tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py

+            self.check_backward_options = {}
+
+    def check_forward(self, xp):
+        pass


Suggested change

pass

raise NotImplementedError

to ensure this method is overridden.

beam2d · 2018-11-15T06:03:40Z

tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py

+        t_hard_shape = (self.nb,) + self.shape[1:]
+        self.t_hard = numpy.random.randint(
+            0, self.shape[0], t_hard_shape).astype(numpy.int32)
+        t = numpy.zeros(numpy.prod(self.x.shape)).astype(self.dtype)


Suggested change

t = numpy.zeros(numpy.prod(self.x.shape)).astype(self.dtype)

t = numpy.zeros(self.x.size).astype(self.dtype)

anaruse · 2018-11-20T05:15:50Z

Thanks for your comments. I've just fixed the branch based on your feedback. Please review it again.

beam2d · 2018-11-27T01:45:17Z

Thank you for the updates. Looks good to me. Could you add a description of the soft target support to the docstring?

anaruse · 2018-11-27T12:24:23Z

I've just added a description of the soft target support. Please check it again.

toslunar · 2018-12-17T10:10:31Z

Could you resolve merge conflicts?

anaruse · 2019-01-15T10:25:12Z

Sorry for the late reply. I've just resolved the conflict to master branch. Could you check it again?

chainer/functions/loss/softmax_cross_entropy.py

tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py

stale · 2019-04-15T13:53:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs. Thank you for your contributions.

beam2d · 2019-04-16T00:55:22Z

Bump. Sorry for the late response. Could you resolve the conflict again?

anaruse · 2019-04-17T09:34:55Z

I've just resolved the conflicts with the master branch. Could you check it?

beam2d · 2019-07-03T01:08:39Z

Sorry for the late reply. The code looks good to me, but I have a bit concern on the naming now. The function named "cross entropy" that computes KL divergence is confusing and it will quite likely be misused. You asked above if it is better to name it softmax_kl_divergence, and that sounds better to me, but I also think it's overkill to add a new function, based on the current implementation. How about adding an option to turn on/off the negative entropy term (i.e., switching between cross-entropy mode and KL divergence mode)? The option only affects the soft target case because the cross-entropy and KL divergence are the same for the hard target case.

toslunar · 2019-07-04T00:41:59Z

chainer/functions/loss/softmax_cross_entropy.py

@@ -223,21 +244,32 @@ def forward_gpu(self, inputs):
            ret = ret.reshape(t.shape)
        return ret,

+    def _soft_target_loss(self, xp, x, t, log_y):
+        kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1)


To compute the cross entropy,

Suggested change

kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1)

____ = -xp.sum(t * log_y, axis=1)

Now, I'm wondering if this is correct, since it fails at the test below.

chainer/tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py

Line 618 in 1ab2632

class TestSoftTargetExpectNearZero(BaseSoftTarget, unittest.TestCase):

This test uses output of softmax as soft target label, so output of softmax_cross_entropy is expected to be zero or almost zero. But softmax_cross_entropy returns non-zero value when 'cross-entropy' is used as soft target loss calculation.

Done in 1ab2632.

toslunar · 2019-07-04T00:55:09Z

BTW, chainer.distributions.Categorical is available.

…method

anaruse · 2019-07-09T13:06:33Z

All right, I added an option 'soft_target_loss' so that you can opt which loss calculation method to use for soft target loss: 'cross-entropy' or 'kl-divergence'. What would you think about this option?

beam2d · 2019-07-16T05:25:32Z

Thanks for the fix!
CI, test this please.

chainer-ci · 2019-07-16T05:35:40Z

Jenkins CI test (for commit 1ab2632, target branch master) failed with status FAILURE.

beam2d · 2019-07-16T06:19:54Z

chainer/functions/loss/softmax_cross_entropy.py

+        if self.soft_target_loss == 'kl-divergence':
+            ret = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1)
+        else:
+            ret = -xp.sum(t * log_y), axis=1)


It looks there is a syntax error here.

Ah, sorry, I was careless.. will fix it soon.

beam2d · 2019-07-29T03:21:33Z

Jenkins, test this please.

chainer-ci · 2019-07-29T04:12:30Z

Jenkins CI test (for commit f847214, target branch master) failed with status FAILURE.

beam2d · 2019-07-29T04:48:50Z

It looks the test is still failing. Could you check it?

anaruse · 2019-07-30T03:28:36Z

The CI test fails at TestSoftTargetExpectNearZero (

chainer/tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py

Line 621 in f847214

class TestSoftTargetExpectNearZero(BaseSoftTarget, unittest.TestCase):

) when 'cross-entropy' is used to compute soft target loss.

The test above expects a loss value to be zero, but it becomes non-zero when 'cross-entropy' is used. I think the cause is that this test is not appropriate for 'cross-entropy' or computation method of 'cross-entropy' is not correct.

What would you think on this?

anaruse · 2019-10-09T08:18:08Z

Sorry for being very late.
I fixed an issue of unit test TestSoftTargetExpectNearZero when cross-entropy is selected as soft target loss calculation by dividing the test to two tests, one for kl-divergence and another for cross-entropy. Perhaps there is no problem remaining. Could you check this again?

beam2d · 2019-10-17T07:49:21Z

Jenkins and flexCI, test this please.

chainer-ci · 2019-10-17T08:58:12Z

Jenkins CI test (for commit 620b55d, target branch master) succeeded!

toslunar

LGTM

beam2d · 2019-10-31T02:03:02Z

Thank you!!!

anaruse · 2019-10-31T05:55:36Z

Thank you for merging the PR !

Support soft target in softmax_cross_entropy

cc276f3

Use argument t for both hard and soft target

525cfed

anaruse added 2 commits October 31, 2018 18:18

Add tests for soft target in softmax_cross_entropy

84a4852

Fix tests of soft target mainly for python 2.7

8ac2293

anaruse changed the title ~~[WIP] Support soft target in softmax_cross_entropy~~ Support soft target in softmax_cross_entropy Nov 1, 2018

beam2d self-assigned this Nov 13, 2018

beam2d reviewed Nov 15, 2018

View reviewed changes

Fix soft_target code based on feedback

3c5cc99

Add docstring for soft target

616aae9

kmaehashi mentioned this pull request Dec 4, 2018

Make F.sigmoid_cross_entropy support float t #5126

Closed

Merge branch 'master' into soft_target

28b07ed

toslunar requested changes Jan 15, 2019

View reviewed changes

chainer/functions/loss/softmax_cross_entropy.py Outdated Show resolved Hide resolved

tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py Show resolved Hide resolved

Remove redundant check code

d766cc2

stale bot added the stale Not updated for a longer period of time. label Apr 15, 2019

stale bot removed the stale Not updated for a longer period of time. label Apr 16, 2019

anaruse added 2 commits April 17, 2019 17:12

Merge branch 'master' into soft_target

908d927

SoftTarget: fix issue in backward_gpu

e60dd3e

toslunar reviewed Jul 4, 2019

View reviewed changes

Add argument soft_target_loss to allow users to opt loss calculation …

1ab2632

…method

beam2d reviewed Jul 16, 2019

View reviewed changes

anaruse added 2 commits July 16, 2019 16:35

Fix syntax error

4d50de1

Add a parameter soft_target_loss to softmax_cross_entropy test

f847214

Fix unit tests of softmax_cross_entropy for soft target

620b55d

toslunar self-requested a review October 29, 2019 06:30

toslunar approved these changes Oct 29, 2019

View reviewed changes

beam2d merged commit 2659ca2 into chainer:master Oct 31, 2019

beam2d added the cat:feature Implementation that introduces new interfaces. label Oct 31, 2019

beam2d added this to the v7.0.0 milestone Oct 31, 2019

	self._coeff = 1.0 / (numpy.prod(x.shape) / x.shape[1])
	self._coeff = 1.0 / (x.size / x.shape[1])

	t = numpy.zeros(numpy.prod(self.x.shape)).astype(self.dtype)
	t = numpy.zeros(self.x.size).astype(self.dtype)

	kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1)
	____ = -xp.sum(t * log_y, axis=1)

Support soft target in softmax_cross_entropy #5595

Support soft target in softmax_cross_entropy #5595

Conversation

anaruse commented Oct 29, 2018

anaruse commented Oct 29, 2018

anaruse commented Oct 30, 2018

beam2d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anaruse commented Nov 20, 2018

beam2d commented Nov 27, 2018

anaruse commented Nov 27, 2018

toslunar commented Dec 17, 2018

anaruse commented Jan 15, 2019

stale bot commented Apr 15, 2019

beam2d commented Apr 16, 2019

anaruse commented Apr 17, 2019

beam2d commented Jul 3, 2019

Choose a reason for hiding this comment

anaruse Jul 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toslunar commented Jul 4, 2019

anaruse commented Jul 9, 2019

beam2d commented Jul 16, 2019

chainer-ci commented Jul 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beam2d commented Jul 29, 2019

chainer-ci commented Jul 29, 2019

beam2d commented Jul 29, 2019

anaruse commented Jul 30, 2019

anaruse commented Oct 9, 2019

beam2d commented Oct 17, 2019

chainer-ci commented Oct 17, 2019

toslunar left a comment

Choose a reason for hiding this comment

beam2d commented Oct 31, 2019

anaruse commented Oct 31, 2019

anaruse Jul 16, 2019 •

edited